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Model-based safety analysis approaches aim at finding critical failure combinations by analysis of 
models of the whole system (i.e. software, hardware, failure modes and environment). The advantage 
of these methods compared to traditional approaches is that the analysis of the whole system gives 
more precise results. 

Only few model-based approaches have been applied to answer quantitative questions in safety 
analysis, often limited to analysis of specific failure propagation models, limited types of failure 
modes or without system dynamics and behavior, as direct quantitative analysis is uses large amounts 
of computing resources. New achievements in the domain of (probabilistic) model-checking now 
allow for overcoming this problem. 

This paper shows how functional models based on synchronous parallel semantics, which can 
be used for system design, implementation and qualitative safety analysis, can be directly re-used 
for (model-based) quantitative safety analysis. Accurate modeling of different types of probabilistic 
failure occurrence is shown as well as accurate interpretation of the results of the analysis. This 
allows for reliable and expressive assessment of the safety of a system in early design stages. 

1 Introduction 

With rising complexity, larger machinery and bigger power consumption, more and more systems be- 
come safety critical. At the same time, the amount of software involved is growing rapidly, which 
increases the difficulty to build reliable and safe systems. 

To counter this evolution, safety analysis has become a focus in many engineering disciplines. Re- 
quirements for the development and hfe cycle of safety-critical systems are now specified in many differ- 
ent norms like the general lEC 61508 1271, D0178-B [36] for aviation or ISO 26262 [1] for automotive. 
Although the standards address very different application domains, they all require some sort of safety 
assessment before a system is put into operation and require the use of formal methods for systems in 
high risk areas. 

Safety assessments can be typically divided into two groups: qualitative and quantitative assessments. 
Qualitative analysis methods like FTA (fault tiee analysis) BTl . FMEA (failure modes and effects anal- 
ysis) [28| or HazOp (hazard and operability analysis) |[30ll are used to determine causal relationships 
between failure of individual components and system loss |[32]| . 

These methods emerge from a long expertise in building safety-critical systems. Their disadvantage 
is, that they mainly rely on skill and expertise of the safety engineer. A potential safety risk will only 
be anticipated if the engineer "foresees" it at design time. This becomes ever harder, because of rising 
hardware and software complexity. 

A new trend is to advance the analysis methods on a model-based level. This means, that a model of 
the system under consideration as well as its environment is built. The (safety) analysis is then not only 
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grounded on the engineers skill but also on the analysis of the model. In this way some causes for hazards 
can be found much earlier. Errors found at early design stages are easier to remove and redesign is less 
costly. Some examples of such methods are explained in 1.34] [3l [32l |9 1 which allow for semi-automatic 
deduction of cause-consequence relationships between component failures and loss of system. 

However, qualitative analysis methods alone are not sufficient for norm adherence (and for showing 
an adequate amount of safety). Another main criterion for safety is the answer to the question: "What 
is/are the probabilities of any of the systems hazards?" This is addressed by quantitative analysis. 

Most quantitative approaches either use the results of prior qualitative methods or a specific (proba- 
bilistic) model of only failure effects and cascading without the functional part of the system. This can 
lead to pessimistic estimations of the actual hazard probabilities. Newer methods like ifTTl [T4l |6l ap- 
ply probabilistic model-checking techniques to overcome this problem by the analysis of system models 
with both failure and functional behavior. These methods use continuous time semantics for the under- 
lying stochastic models, which is well applicable for asynchronous interleaved systems |[l9l . On the 
other hand, many safety critical systems are developed using synchronous parallel discrete systems like 
clocked bus systems and processing unitsQ Such synchronous parallel systems are better expressed in a 
discrete time model |fT9l . 

We propose a method for probabilistic model-based safety analysis for synchronous parallel sys- 
tems. We show how different types of failures, in particular per-time and per-demand, can be accurately 
modeled and how such systems can be analyzed using probabilistic model checking. 

The next section (Sect. O introduces a small case study and gives a very short introduction to our 
qualitative model-based safety analysis method. Sect. [3] shows how accurate probabilistic failure model- 
ing can be achieved. Sect. |4] shows how probabilistic models can accurately be analyzed. Sect. [5] gives 
an overview of related work in the field, and Sec.[6]concludes the paper. 

2 Case Study 

For illustration purpose we use a small case study taken from literature f4T\. Only a brief overview 
of model-based (qualitative) safety analysis is given, more details can be found in |[32l . In short, the 
qualitative model-based safety analysis consists of the following steps: 

1. Formal Modeling of the functional system 

2. Integration of the direct failure effects and failure automata forming the extended system model 

3. Computation of all minimal critical sets using deductive cause consequence analysis (DCCA) 

2.1 System Model of Case Study 

The case study consists of two redundant input sensors (SI and S2) measuring some input signal (I). This 
signal is then processed in an arithmetic unit to generate the desired output signal. Two arithmetic units 
exist, a primary unit (Al) and its backup unit (A2). The primary unit gets an input signal from both input 
sensors, the backup unit only from one of the two sensors. The sensors deliver a signal every 10ms. If 
the primary unit (Al) produces no output signal, then a monitoring unit (M) switches to the backup unit 
(A2) for the generation of the output signal. The backup unit should only produce outputs, when it has 
been triggered by the monitor. The case study is depicted in Fig.[T] 

'For example SCADE Suite which is based on the synchronous data-flow language LUSTRE 1 17| is widely used in industry 
for the development of safety critical systems, as it lEC 61508 and DO-178B certified code generators 
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Figure 1: Backup Equipped System 

The system model is based on finite state maciiines whicii operate in a discrete time synchronous 
parallel way. We use a graphical representation of the modules of the actual model which is written 
for the SMV f29\ model checker. The initial state is marked with two circles, the transition labels are 
predicates. The state of each module is exported as output so that it is visible in the other modules. The 
modeling of the second arithmetic unit (A2) is shown in Fig. [2] 




Figure 2: Transition System for Second Arithmetic Unit 



Initially, the unit A2 is in state "idle" (i.e. a hot stand-by state where no output is produced). It 
stays in this state until it gets activated (predicate "activate" is true) by the monitoring unit. It then is 
in state "sig", as long as there is data available (predicate "signal" which means sensor (S2) produces 
data). If there is no data available, the unit enters the state "noSig", as no signal can be produced. If the 
sensor starts delivering data again, A2 switches back to state "sig". The other modules of the system are 
modeled in a similar way, the system model is then the parallel composition of all the modules. 

2.2 Formal Failure Modeling 

In this scenario, a variety of failures modes is possible. The sensors can omit a signal (SlFailsSig, 
SlFailsSig), making it impossible for the arithmetic unit to process the data correctly. The arithmetic 
units themselves can omit producing output data {AlFailsSig, AlFailsSig). The monitor can fail to 
detect that situation {MonitorFails), either switching if not necessary or not switching if necessary. The 
activation of the A2 unit can fail {A2 Fails Activate). All these failure modes must now be integrated into 
the (functional) model of the system. 

The main idea to integrate failures correctly into a system is to separate failure occurrence patterns 
and direct failure effects. This allows for conservative integration of failure modes, i.e. the original 
behavior of the system is still possible if no failure occurs. 

Occurrence patterns are modelled by failure automata. The most basic failure automaton has two 
states, one state yes modeling the presence of the failure and a state no modeling its absence. The 
transition possibilities between the states determine the type of the failure mode. For example: if the 
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state yes can never be left if it became active once, the failure mode is called persistent, if the state can 
non-deterministically switch between yes and no, it is transient. More complex failure modeling can for 
example incorporate repairing or disappearance of the failure after a given time interval. 

The effects of the failure are modeled in the system model, using a predicate failure as the indication 
that a certain failure has appeared (the corresponding failure automaton is in not in state "no"). The 
formal system model with the integration of the failure effects is called the extended system model. The 
failure automata are then used in parallel composition with the extended system model. The integration 
of the failure mode "A2FailsActivate" into the model can be seen on the left side in Fig. [3] The extension 
of the functional model (as shown in Fig. |2l) has been straight forward: whenever the unit A2 should 
be activated, it will only enter state "sig" (producing an output signal) if and only if the corresponding 
failure automaton is in state "no", i.e. the predicate -^failure holds. 



Figure 3: Failure yiode AlFailsActivate: Direct effect (left) and failure automaton (right) 

The corresponding transient failure automaton is depicted on the right side of Fig. [3] It is non- 
deterministic, as the transition labels are always "true" which means that at any time it can stay in the 
current state or leave. This models a randomly appearing and disappearing (transient) failure mode. A 
more detailed explanation for general conservative integration of failure modes can be found in I 33il . 

2.3 Qualitative Model-Based Safety Analysis 

After the failures and their direct effects aie integrated into the system model, model-based safety anal- 
ysis can be conducted on the extended system model. This can be done with DCCA (deductive cause- 
consequence analysis) |[3T1 l32l . DCCA provides a generalization of the FTA minimal cut sets, called 
minimal critical sets and is well suited to synchronous contexts |[T6l . The generalization is proven never 
to be worse than minimal cut sets and can often lead to a more accurate analysis. It is based on the 
analysis of the whole system using temporal logics and model-checking and does not rely on the manual 
construction of a FT. It can be proven to be correct and complete 131'! whereas FTA can be complete but 
not correct and the minimal cut sets are too pessimistic (e.g. single point of failures instead of neces- 
sary failure combinations). In order to conduct the analysis, the synchronous parallel state machines are 
transformed to a Kripke structure which is the used for model checking using the CTL branching time 
temporal logic lfT3l . 

Definition 1 Critical Set /Minimal Critical Set 

For a system SYS and a set of failure modes A a subset of component failures F C A called critical for 
a system hazard, which is described by a predicate logic formula H if 



-•activate V {activate A failure) 




SYS\=E{rUH) where r:= /\ -^5 

Se(A\r) 



r is called a mcs (minimal critical set) ifT is critical and no proper subset ofT is critical. 
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Informally the proof obligation means that: "There exists (E) a run of the system on which no failure 
mode of the set A \ F appears (F) before ( U ) the Hazard (H) appears." The failure modes in the set A 
can appear in any order before the hazard. 

The actual computation of the minimal critical sets starts with A = 0, which equals functional cor- 
rectness (testing if no occurrence of failure modes is critical). Then single failures are checked and then 
the sets are iteratively increased, omitting sets with critical subsets. Using this approach only the minimal 
critical sets are computed, as criticality as in Def.[T]is monotone. 

In the case study, using the SMV model checker ||29l to compute the minimal critical sets using 
the proof obligation of Def. [T] leads to the following resulting minimal critical sets for the hazard "no 
signal at output (O)": Both arithmetic units fail {{AlFailsSig, AlFailsSig}), one arithmetic unit and the 
monitor fails {{AlFailsSig, MonitorFails}) or {{AlFailsSig, MonitorFails}), the primary unit Al and 
the second senor fail {{AlFailsSig, SlFailsSig}), the monitor and the second sensor fail {{MonitorFails, 
SlFailsSig}), both sensors fail {{SlFailsSig, SlFailsSig}), the monitor fails and the activation of A2 fails 
{{MonitorFails,A2FailsActivate}) and the primary unit fails and the activation of A2 fails {{AlFailsSig, 



3 Probabilistic Failure Mode Modeling 

The qualitative analysis gives only insight into which combination of failures can lead to a hazard. Never- 
theless for an accurate estimation of the probability that this happens, quantitative methods are required. 
Different types of failures, particularly per-demand and per-time are possible which must be accurately 
modeled. The overall approach for the proposed method for probabilistic model-based safety analysis is 
an extension of the one in Sect. |2l 

1. Modeling: The same transition system as in the qualitative analysis can be used 
2a. Failure Modeling: The same failure modes can be used 

2b. Probabilistic Modeling: The modeling of the effects of the failure modes is augmented "^ith failure 
probabilities m& failure rates, integration of the failure effect modeling is different depending on 
the type of failures. 

3. Probabilistic model checking: The probability of the occurrence of the hazard H is computed. 
3.1 Temporal Resolution 

For a realistic estimation of probability, the correct consideration of the passing time is very important. 
In a discrete time context, there exists a basic time unit which passes whenever the system performs a 
step (i.e. all parallel finite state machines execute a transition). This is called the temporal resolution 8t 
of the system. In synchronous parallel systems this will usually be the basic clock of the system. This 
is the main difference between CTMC and DTMC models. In DTMC if there is a probabilistic choice 
then the system will perform a step according to the probabilities exactly every 5t time units, whereas 
for continuous systems, it is only possible to reason about the probability of reaching a state within a 
given time t. For any possible time t <oa this probability is always below 1. For DTMC models, the 
time t is always an integral multiple of 5t of the form t = k5t,k € N. Therefore for a clocked system 

^On top of the minimal critical sets, temporal ordering information on the failure modes can also be automatically be 
extracted from the extended system model, in this case MonitorFails A IFailsSig, i.e. the set {AlFailsSig, MonitorFails} is 
only critical if the monitor fails before the arithmetic unit, for details see 1151 . 
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with synchronous parallel components (as opposed to interleaved components), like many safety critical 
systems are, DTMC modeling is much better suited than CTMC modeling fl^. 

If asynchronous modeling is needed, e.g. vehicles are modeled which may either move with a (nor- 
mally distributed) velocity or do not move at all, then it must be done explicitly by specifying "self- 
loops". Making this explicit allows for modeling of asynchronous behavior while conserving the tempo- 
ral semantics of the safety analysis technique. 

3.2 Failure Modeling for Quantitative Safety Analysis 

To get accurate quantitative results, the correct probabilistic modeling of the failure modes, especially 
of their occurrence pattern and their occurrence probability is very important. At first, the type of the 
failure mode must be determined. Two main types of probabilistic occurrence pattern for failure modes 
exist. The first is per demand failure probability, which is the probability of the system component 
failing its function at a given demand (comparable to low demand mode in lEC 61508). The second is 
per time probability, which is the rate of failures over a given time interval (comparable to high demand 
or continuous mode in lEC 61508). 

Which type of failure modeling is best fitting for a given failure mode can only be decided on a case- 
by-case basis. Sensor failures - for example - will very often be modeled as a per time failure mode, 
because sensors are often active the whole time and are often modeled as a transient failure mode. Other 
failure modes, like the activation of a mechanical device, will often be modeled as a per demand failure, 
as a distinct moment of activation exists when the failure may occur. 

The failure probability has an effect on the occurrence pattern of the failure mode and is therefore 
reflected in the failure automata. For the probabilistic modeling we use a graphical representation of the 
finite state machines used by the probabilistic model checker PRISM [25], which describes probabilis- 
tic automata. These are finite state machines like those used for qualitative modeling (Sect. I2.2[ with 
the difference that there is no non-determinism and the transitions are labeled with both an activation 
condition and a transition probability. 

Transitions are labeled with constraints of the form {(^;p) which means that: "If ^ holds, then the 
transition is taken with probability p". Omitting p means probability 1. A constraint of the form ) V 
{Y',P2) means that the transition is taken with probability piif<p holds and with probability p2 if Y holds. 
As the model is deterministic <p AY = false always holds. As an additional requirement in order to be 
a well defined DTMC model, for each state, for each activation condition 0, the outgoing transition 
probabilities must always sum up to 1. This assures that the transition relation is total. 

3.2.1 Per-Time Failure Modeling 

A per time failure can be modeled by adding the failure probability to the transition from the state "no" to 
the state "yes" in a failure automaton as shown in Fig.lH as it can occur at any time of the system run. So 
basically the non-deterministic transitions of the transient failure automaton are replaced by probabilistic 
transitions. 

The probability p is computed from the given per-time failure rate X and the temporal resolution 5t 
in order to approximate the continuous exponential distribution with parameter A in the discrete time 
context. This distribution function is shown in Eq. [T] It computes the probability that the failure occurs 
before time t (i.e. the random variable X has a value less than or equal t). This distribution is often 
used for per-time failure modeling, especially in continuous time models, therefore it is important to be 
expressible in the discrete time context. 
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(tTue-A-p) (true;p) 




Figure 4: Per-time Failure Automaton 



PiX <t)= f e-^' = 1 - e-^' (1) 

JO 

In the discrete time model of DTMCs, the occurrence probability of the failure mode before time 
t = kSt (k time steps of length 5t) as modeled with the failure automaton (Fig.H]) forms a geometric dis- 
tribution and is shown in Eq. (|2]). As p is the per-demand occurrence probability, I— pis the probability 
that the failure does not occur and P{X > k) is the probability that the failure does not occur for k time 
steps. 

P{X <k) = l-P{X>k) = l-{l-pf (2) 

Using the identity e' = Z/m„^oo(l + the continuous exponential distribution can be approximated 
with the discrete geometric distribution as shown in Eq. Q. For longer time intervals k approximates n. 
Then X5t can be substituted as probability p in the per-time failure automaton (Fig.|4| and in Eq.|2] 

1 + ^ J = 1 - limn^^ i 1 — J 1 - (1 - Xdtf (3) 

The left graph in Fig.|5]shows the absolute approximation error e{t) = \{l —e^^^) — (1 — (1 —X8tY)\ 
for A = 10"2|^ 5; ^ and t = k8t. The maximum is reached at f = j with a function value for the 

exponential distribution 1 — e~ = 1 — ^ 0.63212. The approximation error at f = is approximately 
5.1095 • 10^' which is several orders of magnitude lower than the function value. This can be seen on 
the right graph of Fig. [5] which shows the relative error | ^^^i, |- In both figures the x-Axis is the time 
axis t in hours. The approximation error decreases as t and k increase (the relative error after t = j). 




Figure 5: Absolute and relative error for discrete approximation 
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3.2.2 Per-demand Failure Modeling 

The correct modeling of a per demand failure mode is more complex. Not only the occurrence pattern, 
but also the modeling of the direct effect must be adapted to probabilistic modeling. The challenge is 
that the system which executes a demand and the failure automaton which indicates whether the demand 
succeeds or fails must take a step at the same time and the failure automaton is not allowed to take a 
transition if there is no demand, or else the computed probabilities are not correct. 

For the occurrence pattern, a predicate demand is defined which indicates that there is a demand 
to the safety critical component. The per-demand failure automaton can only leave the state "no" if 
demand holds, see Fig. [6] The state "no" has two transitions, one loop which is active if there is either 
no demand demand or if there is a demand but the demand succeeds with probability {I— p) (p being 
the per-demand failure probability). The state "no" is only left with probability p if demand holds. 

^demand V {demand; 1 — p) {deriiand: p) 




Figure 6: Per-demand failure automaton 

The integration of the direct effect of a per-demand failure into the system model is a bit more compli- 
cated. It is illustrated via the integration of the failure mode of the second arithmetic unit AlFaUsActivate 
of the example in Sect. 12.21 At first, the predicate activate which means the hot stand-by mode must be 
left, is used to define the demand as demand := activate, i.e. the demand is the activation command for 
the secondary arithmetic unit. The activation can only fail if it is actually called, therefore it is modeled 
as a per-demand failure. 

Now if the demand holds, the failure automaton and the system can make a transition in parallel. 
Therefore the information whether the demand has been met is available after the transition has been 
taken. In order to get the timing correct, an additional state is introduced which represents either failure 
or success of the demand, depending on the state of the failure automaton. 

For the example this can be seen in Fig. |7] The demand {activate) is possible in state idle. The 
original successor states of idle were sig if the demand was successful and idle if the demand failed (see 
Fig.©. For the probabilistic modeling the state idle' is added to represent a failed demand if at the same 
time the failure automaton is in state "yes" and a successful demand if the failure automaton is in state 
"no". To preserve the original behavior of the system, the transitions of idle and sig must be added to 
the state idle' in conjunction with the predicates in{idle) or in{sig) which are defined as shown in Eq. (01 
and dSl). 



in{idle) := state = idle V {state = idle' A failure = yes) (4) 
in{sig) := state = sigy {state = idle' f\ failure = no) (5) 

These predicates must then also be exported from the modules and be substituted in all other places 
where state = idle or state = sig appears (predicates for transitions, logic formulas etc.). If this is done, 
the observable behavior of the automaton is the same as the original failure effects modeling as defined 
in |[33]| and shown in Fig. [3] for the example, but with a correct modeling of the per-demand failure 
occurrence pattern and direct effect. 
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3.2.3 General Per-Demand failure integration 

The last section illustrated the integration of a per-demand failure mode for the given example. In gen- 
eral, this integration requires first to define what constitutes a "demand" for a given failure mode. This 
typically results in a state formula, which divides the set of all states (of the system) in states where there 
is a demand for the functionality of the component and those where there is no demand. Now, for all 
states s € S where a demand to a component is exists, let C 5 be the set of direct successors states if 
there was a successful demand and C 5 if it was unsuccessful, i.e. 

As := {a G S\3 transition from s to a with activation condition demand A failure = no} 

Bs := {b € S\3 transition from s to b with activation condition demand A -^failure = no} 

There can also be successor states of s that are not in A^UBs, the successors if there was no demand 
(where demand as in the definition of A^ and B^ does not hold) to the safety critical system. These states 
do not play any role in the following construction, but are one of the main causes that the construction is 
required. The reason is that if there is no demand and therefore the successor of s is not in Ag U B^, the 
failure automaton is not allowed to make a transition. 

Let Ta/.= { Transitions from 5' to <i with d £A}, := {<p\ <p A failure = no is activation condition 
of any t G 7^}, and define Tb^ and analogously, let T := Ta^ U Tb^, <I> := ^a, UOg,. As the model 
is required to be deterministic, V0,-, 0^ € <I>a, / 7^ j : 0, A 0^ = false holds, the same holds for <I>b. Now 
demand can be defined as {state = s) A \/ ^^^<^- 

To integrate a per demand failure, the following steps must be executed for each automaton M in 
which direct effects of a failure mode are modeled, resulting in an automaton M': 

1. Define M' with the same states as M. All transitions but those starting from state s and going to 
states in A U B are kept. 

2. Define a new state s' in M' which represents being in any state J G A^ UB^, add a transition from s 
to s' where the activation condition is demand 

3. Define a decide automaton which is used to specify which state J G A5 UB.,, the state machine M' 
really has, the set of possible states for the decision automaton is A, x Bs U {undef}. 

4. Add failure probabilities to the corresponding failure automaton as in Fig.lH 

5. For each transition from d G A^ UB^ to a successor d' with activation condition i/a add a transition 
from s' to d' in M' with the activation condition Ain{d) 
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6. In the decide automaton, let undef be the initial state and add a to transition to a state d = [a,b\ G 
As X Bg if the transition from to a is activated if failure = no holds and the transition from stob 
is activated if -^failure = no holds. 

7. Add the corresponding transitions also between the states J € A x B of the decide automaton if a 
demand is possible in two consecutive time steps 

This construction allows the failure automaton to make a transition from no iff demand holds, M' 
makes a transition from s to s' and the respective decide automaton makes a transition to one of its states. 
For each of the original successor states d G A^VJEg, the predicate M' = J is not enough to characterize 
the fact that M' is in state d. Therefore the predicate in{d) as shown in equation Q and ^ is introduced 
and replaces the occurrences of M' = d,d £ As UB?, for d € Ay as shown in Eq. (O and for d € as 
shown in Eq. ([7]). 



Because of the determinism and the requirements of a total transition relation, the successor state of 
s is always well-defined, as for each label predicate I G ^(atomic predicates of G UOg) there is 
exactly one transition to a state if failure = no and exactly one transition if ^failure = no. 

In the example of Sect. 13. 2.2] the decide automaton has been omitted in order to illustrate the idea of 
per-demand failure mode integration without too much complexity. As idle' only represents two different 
states, and therefore the distinction using the failure predicate is enough. But for more complex systems 
with several possible per-demand failures, the general construction using the additional decide automaton 
is necessary. 

4 Quantitative Safety Analysis 

The probabilistic models of Sect. [3] can now be used to compute the overall hazard probabilities of the 
system using probabilistic logics and probabilistic model checkers. 

4.1 Probabilistic Model Checking 

For discrete time probabilistic models, the probabilistic logic probabilistic computational tree logic 
(PCTL) is used. For continuous time model, continuous stochastic logic (CSL) is used. Their respective 
semantics can be found in |[26llT8l . 

PCTL is a probabilistic variant of CTL. Instead of Kripke structures as in CTL model-checking, 
labeled Markov chains are the system model for PCTL model checking. Both are very similar, the 
main difference is that in Kripke structures there is either a transition between states or there is none. In 
labeled Markov chains there is either no transition between states or it has an assigned probability. PCTL 
formulas are of the form: 



in{d) 
in{d) 



M' = dy{M' = s' A decide = [ 
M' = dy {M' = s' A decide = [: 



s,d] A failure = no) 
s,d] A ^failure = no) 



(6) 
(7) 



-€{<,<,>,>} 



and assert that for the probability P of the path formula ^ the equation P p holds. This means (for 
example), for a given system the given formula (/> holds with a probability P and P < p. Intuitively this 
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means, that the probabiUty that holds on a system trace starting in the initial state is less than or equal 
to p. The other operators are defined analogously. 

It is also possible to (automatically) calculate the probability directly. The according syntactic ex- 
pression is: Instead of giving a result true or false, this returns the actual probability that the 
formula holds on a given system trace. 

The path operator available in PCTL is the U operator. "Globally (G)" and "Eventually (F)" operators 
can be derived from this operator {G<p := -'{trueU -'(p) and F(p := {trueU(p)). Negation of a formula is 
formulated by using P<q. In order to compute the probability of the negation of a formula <p, 1 — 
can be used. The existence of a trace on which ^ holds can be checked with P>o[(^»]. 

The exact definitions of semantics and probability measures can be found in ll26l [TSll . Algorithms 
to check PCTL formulas can be found in |26|. Efficient model checking algorithms are implemented in 
tools like PRISM |[25l or MRMC [22]. A comparison of the efficiency and memory usage of different 
probabilistic model checkers including stochastic simulation based model checkers using hypothesis 
testing can be found in ll2ll . 

4.2 Computation of Hazard Probabilities 

PCTL proof obligations can then be used together with the probabilistic model to compute the precise 
probability that the hazard occurs, given the per-time failure rates and the per-demand failure probabili- 
ties of the failure modes. 

The semantics for PCTL are defined for infinite system traces. This cannot be used to compute the 
hazard probability, as it would always be either 1 if the hazard can occur or if the hazard cannot occur. 

Instead, the analysis is conducted for a given time range. What is computed is the probability that 
the hazard occurs in k time steps of length 5t. To accomplish this, the bounded until temporal operator is 
used. If U -'^i/A holds, then there exists a bound j < k, so that y becomes true after no more that j steps 
and is true for all time steps / < j. Using the bounded until operator, the occurrence probability of the 
hazard H in time t = k5t is computed via Eq. dUl which is basically the bounded PCTL DCCA formula 
of Def.dlfor an empty minimal critical set. 

:= P^,[trueU-''H] (8) 

It is important to note, that the direct analogon to the DCCA formula (/\(r) := P=?[rU -^^/f] , F / 0) 
does not compute the probability of the failure modes in the minimal critical set F causing the hazard 
H. The reason is that FU-^H limits the set of traces to those where only those failure modes in F can 
appear. This is adequate to analyze whether these failure modes are sufficient to cause H as it is done 
with DCCA (which is a worst case analysis). 

Nevertheless for probabilistic analysis the probability of all traces where F is the cause of H is 
important, i.e. also the traces where F causes H but other failure modes appear but without any direct 
effect on the occurrence of H. For example, if A2 fails transiently before the monitor fails and then Al 
fails, the cause for "no signal output" is the minimal critical set {AlFailsSig, MonitorFails] . But the 
probability of the trace where AlFailsSig appears would not be considered, as T\]-'^H limits the traces 
to those where only AlFailsSig and MonitorFails appear. 

Instead of pure DTMC models as shown here, it is also possible to use MDPs (Markov decision 
processes) which allow a non-deterministic choice between different distribution functions in a state. In- 
stead of exact probabilities, MDP analysis allows for computation of minimal and maximal probabilities. 
For safety analysis, the maximal (worst case) probability of the hazard is of interest, as shown in Eq.|9] 
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Pmaxk{H) := Pmax^i [true\]-^H] (9) 

4.3 Quantitative Results 

For the analysis of the example case study, all failure modes were modeled as per time failure modes with 
the exception of the per demand failure A2 Fails Activate. The minimal critical sets are given in Sect. 12.31 

Assuming an error rate of 10^^^ for each per time failure mode and using a temporal resolution 
5t = lOms, this translates to a per-step failure probability of pfait = 2.7 • 10^^ for the per-time failure 
modes. For the failure mode "A2ActivateFails", a per demand failure probability of 10^^ is assumed. 

Using PRISM the computation of the hazard "no output delivered" can be computed for a for 
k = 360000 (i.e. running time of Ih) using the probabilistic modeling as described in Sect. [3] The re- 
sult is shown in Eq. [TOl The analysis was conducted on an AMD64 3Ghz and needed ca. 140s to 
complete. In this case the dominant factor in the analysis time is the large number of iterations, as k 
matrix multiplications are required. This can very likely be made much faster using new approaches for 
probabilistic model checking [Li5j|. 

/'e-.^(//) = 2.964375 -10-'^ (10) 

If a quantitative method based on the a-posteriori analysis of the qualitative results like FTA is used, 
the estimation of the actual hazard probability is either very pessimistic (if the dependency of the or- 
dering is not considered) or it can get very complex. For example to increase the accuracy, the system 
model must be analyzed further and the dependencies of the effects of the various failure modes must be 
explored (the model is based on finite state machines and time passes if a state changes which can then 
be detected in the next time step). In addition, as many as possible of the possible occurrence patterns of 
the failure modes must be enumerated to enhance the result. On the other hand, the probabilistic analysis 
can do this automatically and is much less error prone than such an a-posteriori analysis of the model. 

5 Related Work 

Several approaches for quantitative safety analysis exist which rely mainly on the analysis of a previous 
qualitative analysis. This can be either fault tree analysis (FTA) BTl or methods to compute the critical 
failure combinations directly from a model. One example is DCCA fST], other methods relying on fault 
injection were developed in the ESACS project 1 10] like the FSAP/NuSMV-SA framework |f9l or in 
the ISAAC project [31 were SCADE was used for modeling and safety analysis [2J. In order for the 
estimation of the global hazard probability, the "FTA-formula" is used (see Eq. (ITTI )) which gives an 
upper bound for the hazard probability. 

hemes 5gA 

It is clear, that no ordering information is considered in this formula as it depends on all probabilities 
being independent, so the estimation can be very pessimistic. With dynamic fault trees (DFT) [39], 
ordering information and dependencies can be analyzed, they can partly be deduced from the models 
directly HI [151 and can give more accurate results than FTA. Nevertheless it is still based on a-posteriori 
analysis of qualitative results with the corresponding disadvantages. 
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In the COMPASS project CUl the FS AP-NuSMV/S A f9] framework is combined with the MRMC IH 
probabilistic model checker to allow for the analysis of systems for Aerospace applications specified in 
the SLIM [12J language, which is inspired by AADL lEIllMl. The hybrid behavior of the SLIM models 
and all internal transitions are removed by lumping and the resulting interactive Markov chain is analyzed 
with MRMC. 

Another approach for probabilistic safety analysis is probabilistic failure modes and effects analysis 
(FMEA) [ 14 1 where per-time failure behavior is integrated into system models via failure injection. After 
then, FMEA tables are computed which are often used in industrial safety analysis processes. 

Both these approaches use continuous Markov chains models (CTMC) and only use per-time failure 
mode modeling with failure rates. The semantically correct integration of per-demand failure mode 
into continuous time models is difficult at best and in general not possible (non-deterministic behavior 
may occur). But it would be very interesting to see how these methods, especially the SLIM modeling 
language can be used for synchronous parallel system and in discrete time models. 

In the AVACS project [35], another model-based safety analysis approach based on statecharts was 
developed |6]. It also uses continuous time models, but an additional difference is, that the hazard 
probability is analyzed for each minimal cut set. As explained in Sect. 14.21 the probability for a given set 
to cause the hazard is not easy to compute. Therefore it is not done in ||6l, but an "importance analysis" 
is conducted, analyzing single critical combinations and rating them according to their contribution to 
the global hazard probability. 

6 Conclusion and Future Work 

We showed how model-based safety analysis can be conducted on extended system models with proba- 
bilistic failure behavior. Per-demand as well as per-time failure modes can be integrated and the overall 
probability of a hazard occurrence in a given time interval can be computed. This quantitative analysis 
is well applicable to synchronous parallel systems. 

The method has been applied to much larger case studies with more than 10^ states and due to recent 
developments in the area of probabilistic model checking, e.g. using the GPU [7], parallel model check- 
ing El, abstraction techniques |[24l l4ll20l and also simulation based stochastic hypothesis testing [i43l 
still larger systems will likely be analyzable in the future. Hypothesis testing is very interesting, as it 
allows to give probabilistic results with confidence intervals based on several simulation runs. Using it, 
probabilistic analysis can give reliable results even if complete model checking is not possible. 

In any way, this model-based analysis can give insights into the properties of a safety critical system 
in the in early phases of the design, where the cost of redesign is lowest. 

At the moment, a framework is developed which allows for both qualitative and quantitative analysis 
on the same model, automating the complex integration of per-demand failures as presented here. The 
given approximation of the continuous distribution and the time discretization worked well in the consid- 
ered case studies. Nevertheless additional estimation of the accuracy of the results is aspired, depending 
on the temporal resolution and the discrete probabilities, especially for probabilistically modeled envi- 
ronment behavior other than failure modes. Very beneficial would be the direct integration into tools 
like Topcased [40] or SCADE, which requires additional work for the transformation from these into our 
framework. But the semantic proximity of the SCADE execution models and the already accomplished 
integration of DCCA into it lfT6l are evidence that this is possible. 
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